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Abstract 

Increasing the number of Science, Technology, Engineering and Math (STEM) university graduates is 
considered a key element for long-term productivity and competitiveness in the global economy. Still, little 
is known about what actually drives and shapes students' choices. This paper focusses on secondary school 
students at the very top of the ability distribution and explores the effect of more exposure to science on 
enrolment and persistence in STEM degrees at the university and on the quality of the university attended. 
The paper overcomes the standard endogeneity problems by exploiting the different timing in the 
implementation of a reform that induced secondary schools in the UK to offer more science to high ability 
14 year-old children. Taking more science in secondary school increases the probability of enrolling in a 
STEM degree by 1.5 percentage point and the probability of graduating in these degrees by 3 percentage 
points. The results mask substantial gender heterogeneity: while girls are as willing as boys to take 
advanced science in secondary school - when offered -, the effect on STEM degrees is entirely driven by 
boys. Girls are induced to choose more challenging subjects, but still the most female-dominated ones. 
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1 Introduction 


In the new heavily globalized and innovation driven economy, increasing the number of Science, 
Technology, Engineering and Math (STEM) 1 university graduates is found to generate high 
social returns in terms of long-term productivity, growth and competitiveness [Winters, 2014, 
Peri et ah, 2013, Moretti, 2012, Atkinson and Mayo, 2010, Jones, 2002], Moreover, a STEM 
degree also represents a very profitable private investment for college graduates themselves. 
Lifetime earnings of STEM graduates are extremely high [Joseph Altonji and Maurel, 1974, 
Kirkeboen et al., 2016, Hastings et al., 2013, Pavan and Kinsler, 2015, Rendall and Rendall, 
2014, Koedel and Tyhurst, 2010]: Altonji et al. [2012] show that nowadays intra-educational 
income differences are comparable to inter-educational differences. In the US in 2009 the wage 
gap between the average electrical engineer and someone with a degreee in general education 
was almost identical to the wage gap between the average college graduate and the average 
secondary school graduate. Moreover, graduates in STEM fields earn more independently of 
the quality of the institution they attended [James et al., 1989, Kirkeboen et al., 2016, Arcidi- 
acono et al., 2016]. Non-monetary returns are also high in STEM occupations: Goldin [2014] 
classifies occupations based on their degree of temporal flexibility, i.e. how important it is to 
stay long or particular hours in the office, and STEM occupations are ranked among the first. 
However, despite the high social and private benefits obtained from graduating in STEM de- 
grees, the general consensus among policy-makers is that the current supply of STEM skills is 
insufficient and, when combined with the forecast growth in demand, it presents a potentially 
significant constraint on future economic activity [UK HM Treasury and BIS, 2010, The Pres- 
ident’s Council of Advisor on Science and Technology, 2012, European Commission, 2010]. 2 
Despite the governments of many countries investing a very large amount of funds to induce 
more graduates towards STEM [Atkinson and Mayo, 2010] 3 , the graduation rate or even the 
level of interest of students in graduating in these degrees has remained pretty stable since the 
’80s [Altonji et al., 2012]. While the literature on choices of the educational level is very wide 
and consolidated (starting from the seminal work by Mincer [1974]), there is relatively little 
work on choices of the Held of study. 

This paper evaluates how much of the lack in STEM graduates can be attributed to sec- 
ondary schools, and in particular to the curriculum they offer. Ellison and Swanson [2012] show 
that there is a large heterogeneity in secondary schools effectiveness in developing talents in 
technical subjects like math, which is not explained by differences in schools composition. I 
investigate the role of secondary school curriculum and I seek to understand whether more ex- 
posure to science in secondary school for very high ability students increases by itself the supply 
of STEM graduates. Moreover I explore whether changing the secondary school curriculum and 
increasing students’ preparation in science shrinks the gender gap in STEM degrees enrollment. 

The identification of the effect of studying more science in secondary school is difficult 
because of a double selection problem: the selection of students into different schools -based on 
the curriculum they offer- and that of students into different courses, within the school they 
chose. I address and test both sources of endogeneity: I eliminate the selection in different 
courses within the same school by collapsing the analysis at the school level (in the spirit 

1 Throughout the paper I define as ’’STEM” the following degrees: Physical science, Mathematical and Com- 
puter science and Engineering. 

"Overall, STEM employment grew three times more than non-STEM employment over the last twelve years, 
and it is expected to grow twice as fast by 2018. According to a report by the Information Technology and 
Innovation Foundation [2010], the number of STEM graduates in the US will have to increase by 20-30% by 2016 
to meet the projected growth of the economy. 

3 The US federal government for instance is considering actions with the objective of increasing STEM grad- 
uates by 34% annually [The President’s Council of Advisor on Science and Technology, 2012], 
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of Altonji [1995]) and I address the selection of students into different schools, by exploiting 
exogenous variation in the timing of the introduction of an advanced science course in English 
secondary schools. The UK government introduced in 2004 an entitlement to study advanced 
science for high ability students at age 14, with the explicit aim of fostering enrollment in 
post-secondary science education. This resulted in a strong increase in the number of schools 
offering advanced science: from 20% in 2002 to 80% in 2011. As a consequence, the share of 
students taking advanced science increased from 4% in 2002 to 20% in 2011 and the increase 
was almost entirely concentrated on high ability students 4 (see Figure 1). Thanks to a novel 
dataset that I obtained by combining different administrative sources from England, I propose 
two alternative identification strategies that approach this type of selection problem from two 
complementary perspectives and I use different sources of variation. The first strategy uses 
within-school variation in the type of courses offered over time and, in the spirit of Joensen and 
Nielsen [2009], it exploits the three year time lag between the moment when students choose 
their secondary school (age 11) and the moment when they choose their field courses (age 14). I 
evaluate the effects on students unexpectedly exposed to the advanced science course, since their 
schools started to offer it only after they chose the school. The second identification strategy 
tests the robustness of my results by using across-school within-neighbourhood variation over 
time: it exploits the fact that schools in England, when oversubscribed, select students based 
on home-to-school distance and schools catchment areas vary (unpredictably) over time. My 
second instrument therefore uses variation in whether the schools were offering advanced science 
even before the students started to attend their school. 

The empirical findings can be summarized as follows: taking advanced science at age 14 
increases the probability of choosing science at age 16 by 5 percentage points and that of 
enrolling in STEM degrees by about 2 percentage points. Moreover, offering more science 
courses at secondary school does not only induce more students to enroll in STEM degrees but 
it also increases the likelihood that they graduate in these degrees. This is important, given the 
large problem with the persistence in this kind of degrees [Arcidiacono et ah, 2016, Stinebrickner 
and Stinebrickner, 2014], 5 

Second, I find that the effect on STEM degrees (in its narrow definition) is concentrated only on 
boys: the gender gap in STEM degrees enrollment widens as a consequence of this policy. This 
is not explained by the fact that less girls take advanced science at age 14 - boys and girls at 
this stage select into advanced science in the same proportion - but because girls, when exposed 
to more science in secondary school, even if induced to take more challenging subjects 6 , still 
opt for the most female-dominated ones. 

Taken together, my findings can inform ongoing debates over government intervention to address 
apparent mismatches and market frictions in the supply and demand of post-secondary fields 
of study. My results suggest that, to reinvigorate STEM education and high-skilled STEM 
education in particular, governments should consider a policy aimed at offering more science 
courses to high ability students during secondary schools. I estimate that the policy I consider 
contributed to one third of the increase in the share of STEM graduates in England between 
2005 and 2010. 

This paper speaks to the growing literature that seeks to explain choices of university de- 
grees. Most of the evidence so far comes from surveys or informational experiments and the 
results are mixed. The most common explanations look at the role of expected earnings; com- 

4 I define high ability students as those who were in the top 30 percentile of the primary school grades 
distribution. The increase for these students was around 35 percentage points, from 15% to about 50%. 

s There is a problem of persistence in STEM majors also in England: in the cohort starting university in 2011, 
out of the 17% of students enrolled in a STEM major, only 17% graduated in the same STEM major within 
three years (this figure is 20% on average for the other majors). 

b I define as challenging the subjects usually taken by students achieving very high grades in primary school. 


3 



petencies and preparation; self-confidence; preferences and innate ability [Arcidiacono et al., 
2012, Arcidiacono, 2004, Beffy et al., 2012, Stinebrickner and Stinebrickner, 2014, M and Zafar, 
2014], However preferences and ability are usually considered to be constant over time, and 
it is therefore difficult for policy-makers to shape them; returns to STEM degrees are already 
very high, as stated before, and the elasticity of degree choice to expected earnings is found to 
be rather low [Beffy et ah, 2012]. Moreover, Stinebrickner and Stinebrickner [2014] show that 
students start university being over-confident not under-confident about their scientific ability. 
There is, instead, large scope for policies that interfere with students’ preparation and with the 
primary and secondary schools quality. Many scholars [Cameron and Heckrna, 2001, Moretti, 
2012], indeed, attribute the lack of STEM graduates to the low quality of the US school system. 
Some studies look at the effects of school inputs (usually at the university level), like peers 
[De Giorgi et ah, 2010, Anelli and Peri, 2015], teachers [Scott E. Carrell and West, 2010], teach- 
ing structure [Machin and McNally, 2008] and university coursework [Fricke et al., 2015]. Still, 
excluding some recent studies that evaluate the effects of secondary school curricula using quasi- 
experimental evidence [Joensen and Nielsen, 2009, 2016, Cortes et al., 2015, Goodman, 2012], 
there is little quantitative work on the effects of secondary school courses [Altonji et ah, 2012], 
This is surprising given that not only every single government has to take at some point the 
decision about how to design its country secondary school curriculum but also, differently from 
other policies like changes in peers, this is not a zero sum choice: everybody may potentially 
benefit from a well designed curriculum. 

My paper improves on the existing literature in several ways. 7 

First, I address both layers of selection of students into courses. Most studies [Altonji, 1995, 
Levine and Zimmerman, 1995, Betts and Rose, 2004] use across school variation in the type of 
curriculum offered and do not fully address the possible selection of students into schools, based 
on the curriculum they offer. Since family background and individual motivation are important 
determinants of both the choice of degrees and of the one of secondary schools, the bias in 
estimates that do not take into account selection into schools could be important and could 
lead to an overestimation of the effects. I show that, even in my context where the variation 
in curriculum is induced by a policy, adding school-level controls is not enough to eliminate 
selection bias: the inclusion of school fixed effects and the presence of an instrument turn out 
to be crucial to correctly identify the effect of interest. 

Second, the policy I consider allows me to identify the effect of offering more (natural) 
science courses only, because it does not intervene on other subjects. Instead, changes in 
secondary school curricula usually imply a restructuring of many different courses and it is 
difficult to isolate the effect of one single subject [Altonji, 1995, Joensen and Nielsen, 2009, 
2016, Gorlitz and Gravert, 2015, Jia, 2014]. While my treatment also has multiple components, 
since taking advanced science also implies a change in classroom heterogeneity and composition, 8 
I disentangle the curriculum from the peer channel, using an instrument for peers that exploits 
within-school variation over time in the ability of predicted peers, depending on whether the 
school offers advanced science or not. I find that the effect of the advanced science course 
persists even after controlling for changes in peers’ characteristics. This is key to identify the 
exact origin of the effect and therefore to allow policy-makers to reproduce the policy in other 
contexts. 

Third, the compilers for my instrument are extremely high ability students: I therefore look 
at the effect for those students with potentially very high probability of succeeding in STEM 

' I mention here papers that look at the effect both on earnings and on degrees, even if most of the literature 
looks at earnings without focusing on the effect on the choice of degree. 

8 Because the advanced science course provides the possibility of taking a course exclusively attended by other 
very high ability students. 


4 



degrees and of highest interest for policy-makers because they are more likely to make important 
contributions to scientific and technological fields. On the one side this is important because 
most of the existing empirical studies [Goodman, 2012, Cortes et ah, 2015] analyze policies that 
affect almost entirely low ability students, not likely to enroll at the university at all, or students 
for whom taking science is rather costly [Joensen and Nielsen, 2009, 2016]. 9 On the other side, 
it allows me to separately identify the effect on the extensive margin (i.e. the probability of 
attending university) from the effect on the intensive margin (i.e. the choice of degree) because, 
given that the students affected by the policy I consider would have enrolled at the university in 
any case, the policy does not have any effect on the probability of continuing to study. Any effect 

1 find on the choice of degrees is therefore completely generated by changes on the intensive 
margin. Moreover, the instrument affects boys and girls in a very similar way, therefore allowing 
to test the gender heterogeneity of the effect without worrying about differences in compliance. 

The remainder of this paper is organized as follows. In Section 2, I describe the data, 
the English school system and the reform of the advanced science program in UK secondary 
schools. Section 3 provides an overview of the main identification strategies. Section 4 presents 
the estimated impact of advanced science on post- 16 educational outcomes and it checks the 
identifying assumptions and the robustness of the results. Section 5 inspects the mechanisms 
behind the estimates and, finally, Section 6 concludes. 

2 Data and institutional setting 

2.1 The English school system 

Compulsory education in England is organized in four Key Stages (KS). At the end of each 
stage students are evaluated in standardized national exams. Figure 2 shows a timeline of the 
English educational system. Pupils enter school at age 4, the Foundation Stage, then they 
move to Key Stage 1 (KS1), spanning ages 5 and 6, and Key Stage 2 (KS2, from age 7 to age 
ii)0 At the end of KS2 children leave primary school and go to secondary school, where they 
progress to Key Stage 3 (KS3, age 12-14) and Key Stage 4 (KS4, age 15-16). Admission to 
secondary school is based on criteria usually set by the school or by the local council. Usually 
schools give priority to children who live close to the school or whose brothers or sisters attend 
the school already. At KS4 students start choosing some subjects. * 11 In particular, out of 
usually between 10 and 12 qualifications, students typically choose between 4 and 6 subjects. 12 
At age 16 compulsory education ends and students may continue their secondary studies for 
a further two years. This phase is called Key Stage 5 (age 17-18) and may take place in 
the same secondary school (about 60% of the schools also offer KS5 courses) or in a different 
school. Again, students have many different options: they can choose more vocational or more 
academic-oriented type of qualifications (the so-called A levels), with slightly less than half of 
each cohort undertaking at least one A-level exam at age 18. Students usually take three A 
level or equivalent qualifications 13 , and are free to choose any subject. Finally, higher education 

9 These studies exploit for instance changes in minimum math requirements across US states over time or 
compare students just below or just above the threshold for attending remedial classes in math and find modest 
effects on earnings, concentrated on low-SES students. In my setting, instead, compilers include also extremely 
high ability students, within the same school. 

1() KS1 corresponds to grade 1 and 2 in the US school system, KS2 to grades 3,4 and 5. 

11 A number of different qualification types are available to young people at KS4, varying in their level of 
difficulty. These include: GCSE (the most common qualification in England and the most academic oriented), and 
other more vocational qualifications. I will only consider GCSE qualifications or GCSE equivalent qualifications. 

12 The six compulsory subjects are: English, math, (single) science, information and communication, physical 
education and citizenship. Students in general take overall between 10 and 12 qualifications. 

13 50% of students takes between 3 and 3.5 A level equivalent qualifications. 
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usually begins at age 19 with a three-year bachelor’s degree. Admission to university is usually 
based on which subjects were chosen at KS5 and on the grades achieved. 

2.2 Science in secondary school 

While science is a core component of the National Curriculum at KS4, there are several different 
ways to fulfill the requirement. All students are required to study the basics elements of all 
three natural sciences (physics, chemistry and biology) and should at least take the so-called 
‘single science’ or core science course (which is worth one KS4 qualification). They can, more- 
over, choose to take the ‘double science’ course (worth two qualifications) which leads to more 
knowledge in all the three subjects or the ‘triple science’ course (which is called advanced science 
and is equivalent to take one full qualification in each of the three natural science subjects). 
Finally students can also take more vocational science qualifications. Taking triple science im- 
plies both longer instruction time and the study of more complex science topics. 14 Double 
science and, more recently, triple science provide the standard routes into the fulfillment of KS4 
requirements. 

In 2004 the UK Government published a ten-year investment framework for science and 
innovation [UK Government, 2004], The framework set out the Government’s ambition for UK 
science and innovation over the next decade and emphasized in particular the need for more 
graduates in science. Taking triple science was considered extremely important, because “it gives 
students the necessary preparation and confidence to go on and study science” (Confederation of 
British Industry) . The document established an entitlement to study triple science for students 
achieving at least level 6 or above at KS3 science (the students on the top 40% of the grade 
distribution). 15 The result was a very large increase in the number of schools offering triple 
science. While in 2002 less than 20% of schools offered triple science, by 2011 the share became 
more than 80% (see Figure 1). Between 2002 and 2011 the share of students choosing triple 
science increased from 4% to 20% and the increase was mostly concentrated among high ability 
students (for whom the share increased from 15% to 50%). 

There are several, mainly supply driven, reasons why the exact timing of the introduction of 
the triple science option differs by schools. First, the lack of specialized teachers. 50% of science 
and math students in English secondary schools are not taught by teachers specialized in the 
subject. For teachers teaching outside their expertise, triple science is particularly demanding 
and they need more time to get familiar with the material. Second, the school size: for small 
schools it is difficult to offer a large number of subjects. With the ten-year investment frame- 
work, the government encouraged new collaborative arrangements with other schools (to jointly 
provide triple science). However, setting these agreements up takes time and many schools need 
the support of their Local Education Authority (LEA) and the exact timing of the conclusion of 
these agreements is uncertain. Finally, support and pressure on schools to fulfill the entitlement 
to triple science was provided at the LEA level. 15 Some LEAs were not as supportive as others 
regarding the introduction of triple science: the increase in the share of schools offering triple 
science was very heterogeneous across different LEAs. 

14 In this case students study more difficult topics such as electric current, transformers, some medical applica- 
tion, more quantitative topics in chemistry etc. 

15 In particular the government stated that “all pupils achieving at least level 6 [Level 6 or above is equivalent 
to the top 30% of students] at KS3 should be entitled to study triple science at KS4, for example through 
collaborative arrangements with other schools. 

1<> LEAs organize courses both on how to organize the time schedule to fit the new curriculum and on the new 
material covered and encourage school-to-school learning. There is large heterogeneity on how actively different 
LEAs promoted and pushed the introduction of the Triple Science option in schools. In total there are 152 local 
authorities in England. 
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2.3 Data 


By combining different administrative sources, my final dataset follows all students in main- 
tained schools in the England, 1 ' from primary school till the end of their university career. 

I obtain information on students demographic characteristics from The Pupil Level An- 
nual School Census (PLASC) that collects information on students’ gender, ethnicity, Free 
School Meal Eligibility (FSM), Special Education Needs (SEN), language group as well as their 
postcodes. The National Pupil Database (NPD) provides instead information on students’ 
attainments in all their Key Stages exams (from KS1 till KS5) as well as on every single sub- 
ject chosen (and the corresponding grade) in KS4 and KS5 and on school characteristics (peer 
groups, types of school, teachers’ hirings, schools location etc.). From the NPD dataset I obtain 
also the information about which courses are offered by each school. In particular, I follow the 
official methodology used by the English Department of Education and I infer that a school 
offers a course if at least one pupil at the school took an assessment in that specific course and 
year. 18 I then link the NPD to the universe of UK university students, the Higher Education 
Statistical Agency (HESA) dataset. The HESA dataset provides information on whether pupils 
progress to university, on their degree, on the institution they attend and on whether they 
graduate and in which degree. I combine these two data sources to create a dataset following 
the entire population of five cohorts of English school children. My sample includes pupils who 
finished compulsory education (took KS4 examinations, at age 16) between the academic years 
2004/2005 and 2009/2010. After 2010, there would be no information on university outcomes, 
because I only have data on university results till 2013. Before 2005, there is no information on 
whether the school was offering triple science when the student applied to the school, because 
the data collection starts in 2002 and there are three years of lag. Using information on the 
secondary school attended by each individual, I match the individual record with school level 
data on whether the school was offering triple science when the student applied and three years 
later, when she had to choose her KS4 subjects. 

Finally, I impose a set of standard restrictions on the data. First, I exclude special schools, 
hospital schools, schools where there is a three tier system instead of a two tier system. Second, 
I only use students who can be tracked from KS2 to KS4. 19 This leaves me with approximately 
530,000 students per cohort. 

The data I use are a major improvement over previous studies. While the very detailed na- 
ture of the information needed on subject choices gives particularly large scope for measurement 
error problems in survey data, the students’ administrative dataset usually available in other 
countries do not contain some of the elements necessary for this analysis. For instance, most 
datasets do not have information on university outcomes and the few administrative datasets 
that include post secondary school outcomes as well, refer to rather small countries, relatively 
homogeneous in terms of students’ background and sometimes do not include information on 
previous test scores. The large amount of observations and the heterogeneity in the students’ 
background available in the English dataset, provide me with enough power to accurately run 
my analysis and to study the heterogeneity of the effect on subgroups of the population. 

17 The dataset refers only to England and it excludes private schools, that however educate a small share (7%) 
of British children. 

18 My results are robust to different definitions (at least 5 pupils, at least 5% of the students, for at least two 
consecutive years etc.) and all different definitions are extremely highly correlated. 

19 I checked whether this selection generates any bias (i.e. is correlated with the instrument) and this is not 
the case. The results are available upon request. 
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3 Empirical strategy 

3.1 The selection problem 

The main identification challenge when studying the effects of secondary school courses on post 
secondary school outcomes, is to correct for selection bias. 

To fix ideas, consider the case in which students choose between taking more science in 
secondary school ( D = 1) or not (D = 0). The observed choice of university degree (Y) can be 
linked to potential degrees (Yj where j = 1,0) and the type of science in secondary school (D) 
as: 

Y = Y 0 + D(Y ! - T 0 ) (1) 

The OLS estimates of the effect of choosing more science in secondary school, can be written 
as follows: 

E(Y\D = 1) - E(Y\D = 0) = E(Y\\D = 1) - E(Y 0 \D = 0) (2) 

The main challenge is that students selecting into certain secondary school courses would have 
different potential outcomes in any case, meaning that a simple OLS does not provide the right 
counterfactual (E(Yq\D = 0) / E(Yq\D = 1)). In practice there are two layers of selection: 
selection of students into schools offering triple science and selection of students into triple 
science, for a given school. 

Let’s call 5 a dummy equal to 1 if the school attended by student i offers triple science and 0 
otherwise. Then, the OLS estimates can be written as follows: 

E(Y\D = 1)—E(Y\D = 0) = E(Y\ - Y 0 \D = 1,5 = 1) + 

' v- ' 

ATT 

P{S = 1| D = 0) [E(Y 0 \D = 1,5 = 1)- E(Y 0 \D = 0, 5 = 1)]] + 

" v ' 

selection into courses 

P(S = 0| D = 0) [E(Y 0 \D = 1,5=1)- E(Y 0 \D = 0, 5 = 0)] 

. ' 

selection into schools+courses 

I address the selection problem by tackling the first and the second layer of selection in 
two different ways. Selection of students into courses within the same schools is addressed 
by collapsing the analysis at the school level, since I use instruments that vary only at the 
school-cohort level. Most papers (in the spirit of Altonji [1995]) use school average curriculum 
as instrument and therefore address this type of selection only. This leaves space, however, to 
endogeneity due to selection of students into schools offering different curricula. I address this 
other layer of selection in two different ways, that exploit two different types of variation. 

3.2 First instrument 

My first identification strategy is based on the following equation: 

Yist — llDist + ^/2-^-ist + Cs + Ct + Vi s i (3) 

where Di s t is the dummy equal to 1 if student i in secondary school s, in cohort t takes triple 
science and 0 otherwise; X{ st are school and student controls; 5 S are school fixed effects and 
St are year fixed effects. T.st is the outcome variable, usually a dummy indicating whether the 
student takes science at KS5 or at the university (and 0 if she does not take science or does not 
continue studying). Finally, Vi s t is the error term. 

The school fixed effects take care of time invariant school heterogeneity, such as the overall 
quality of the school, of the students or of the neighbourhood. The time fixed effects absorb 



cohort effects or the presence of policies that uniformly affect the entire school system. Still, 
there may exist time varying factors, changes in cohort quality in particular, that may bias 
my estimates because this may be correlated both with the introduction of triple science in 
a school and with the willingness to take science subjects. I therefore use as instrument for 
Di s t a dummy equal to one if student i in school s and cohort t was unexpectedly exposed 
to the triple science option. I rely on the time span between the time when students choose 
secondary schools (age 11) and the time when they choose their optional subjects (age 14). I 
use as instrument a dummy equal to 1 if school s was not offering triple science when students 
from cohort t applied to secondary schools but starts to offer triple science by the time they 
choose their KS4 subject, three years later. I only include schools not offering triple science 
when students applied. I compare two types of students, a priori identical because they all 
selected schools not offering triple science at age 11: those whose schools unexpectedly started 
to offer triple science by the time they turned 14 (my treatment group) and those whose school 
did not offer triple science when they chose subjects at age 14 (my control group). 20 

This strategy mainly relies on two assumptions. 

First, the assumption that the information set of both students in the treatment and in the 
control group at age 11, when choosing their schools, is the same and does not include the 
information on whether the school is going to offer triple science in the next three years. This is 
very likely, given the large time lapse and uncertainty on when exactly teachers/classrooms and 
time schedules would be ready. Moreover, students are not totally free to choose the school they 
want: there are exogenous geographical constraints in choosing schools in England, especially if 
schools are oversubscribed. In Section 4.3, I show that students who decided to enroll in schools 
offering triple science are observationally identical to students who decided to enroll in schools 
not offering triple science: there is no sign of strategic selection of schools based on whether 
the schools offer the advanced science course, even if the information is available to parents and 
students at age 11. 

Second, the assumption that schools’ decisions on when exactly to start offering triple science 
are related to supply-driven rather than demand-driven factors: schools must decide when to 
start offering triple science not based on the quality of the current cohort attending the school. 
In Section 2.2 I described some supply driven reasons why schools may delay the introduction 
of triple science. In Section 4.3 I show that the timing of the introduction of the triple science 
option is not correlated with (observable) characteristics of current students in the school and 
that school s, before starting to offer triple science, was on the same trend of all other schools. 

3.3 Second instrument 

Still, even if there is no evidence that schools decide when to offer triple science depending on 
observable characteristics of their current cohort, it may still be that unobservable characteristics 
matter. This is impossible to test. My second instrument however is not subject to this 
last concern because it exploits variation in available courses that existed even before current 
students started to attend their secondary schools. This excludes the possibility that the choice 
of offering triple science depends on specific shocks to the particular cohort in the school. 

This instrument compares students living in the same neighbourhood but who are more or 
less likely to enroll in schools offering triple science, because of exogenous changes in schools’ 
catchment areas. 

20 A similar idea, with only one year lag, has been used in Joensen and Nielsen [2009, 2016], to evaluate the 
effects of increasing secondary school curriculum flexibility, that induced students to take more math at secondary 
school in Denmark. I study a different policy that affects very high ability students and identifies the effect of 
more science only. Thanks to the availability of data on previous test scores and of many cohorts, I am able to 
use within school variation and to explore more in details the effect on choices of university degrees. 
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I exploit the fact that when schools in England are oversubscribed, usually prioritize students 
based on geographical distance. 21 Therefore, in each year there will be a maximum distance 
between the school and the students’ addresses above which students will not be accepted. I 
build my instrument in two steps: first, I compute the school catchment areas for each year, 
the area delimited by the circle whose centre is the school and ray is the maximum observed 
home-to-school distance, 22 and I define the set of ‘reachable’ schools for each student. Second, 
I compute the share of ‘reachable’ schools that offered triple science when student i applied. 
Figure 3 shows how the instrument is constructed. Student address refers to the lower level 
output area (LLOA) 23 where student i used to live at age 10. Around V s house there are two 
schools with different catchment areas, whose ray is indicated by the black dashed line. The 
instrument used in this section of the analysis counts how many schools, out of the set of schools 
reachable by students % in year t, offered triple science when i applied to secondary school (in 
this case the instrument in year t— 1 was 1 and in year t was 0.5). The instrument varies both 
because of (unpredictable) variations in schools catchment areas and because of the overall 
increase in the number of schools offering triple science. I estimate the following equation: 


Yipt — QiDipt. + $3 Xipt + 9t + Op + Vipt (4) 

where D ip i is the usual dummy indicating whether student i in year t, who used to live in 
neighbourhood p when she was 10 year old, takes triple sciende and 0 otherwise; X tp i are 
individual controls and 6t and 6 p are cohort and neighbourhood fixed effects respectively; Vi p t 
is the error term. 

I then instrument Di p t using the share of schools reachable in year t, when i applied to 
secondary school, by student i, residing in block p, that were offering triple science in year t (z pt ). 
This instrument compares students attending schools that offer triple science with students 
attending schools not offering it, i.e. it uses across school within neighbourhood variation 
(instead of within school over time variation). Offering triple science is likely to be related to 
other school characteristics, like school quality, that may directly affect the choices of degree at 
the university. This issue may be more relevant when we use across school rather than within 
school variation because differences in quality across schools are likely to be much more sizable 
than differences within schools over time. Section 4.4 addresses this concern by including as 
control the average quality level of the set ‘reachable’ schools in each catchment area over time. 

4 Results 

This section shows results obtained with the first instrument. I first show the overall effect of 
taking more science in secondary school in term of post-16 outcomes (Subsection 4.1) and I 
explore whether the effect is stronger for girls than for boys. Second, I describe who decides 
to take triple science, when exposed to the option of taking it, by characterizing compilers 
(Subsection 4.2) and, in particular, by analyzing whether boys are more likely than girls to 
select triple science at age 14. Finally I check the identifying assumptions and whether the 
main findings are robust to the second identification strategy (Subsections 4.3 and 4.4). 

21 With some exceptions for students with siblings attending the same school or for students with special 
education needs. Since I do not have the full set of information necessary to simulate the exact admission 
formula for each school, I can’t adopt an RDD strategy. 

22 In order to exclude exceptions I eliminated outliers (the distances higher than the 5th percentile for every 
school. 

23 In total there are more than 30,000 LLOAs in England and Wales and each LLOA contains on average 1500 
households. 
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4.1 Main Results 

Table 2 presents the main estimates of the effect of taking triple science at age 14 on the 
probability of choosing at least one natural science subject at age 16 (KS5) and a STEM degree 
at the university. 24 The Table proceeds by estimating the effect of interest under different 
specifications. Column 1 displays results from a simple OLS regression; in column 2 I add 
school fixed effects; column 3 follows Altonji [1995] and uses as instrument for triple science the 
share of students taking triple science in school s and year t; column 4 uses my first instrument 
(z s t) and some school time varying controls 25 , but does not include school fixed effects; column 
5 shows results from my preferred specification that uses my instrument and exploits within 
school over time variation only; finally column 6 adds a school-specific trend. Reassuringly, the 
coefficients of columns 5 and 6 are very similar, suggesting that schools offering triple science are 
on a similar trend. Column 7 estimates the specification of equation 3, but it eliminates controls 
(X ist ). The coefficients of columns 5 and 7 are again very similar, suggesting that -conditional 
on my fixed effects- the instrument is quasi randomly assigned. As expected the bias in the 
OLS estimates is upward: the coefficient indeed gets smaller as I correct for all different layers 
of selection. The Table shows that, if a student strengthens her science preparation at age 14, 
she is 5 percentage points more likely to take science at age 16 and 1.5 percentage points more 
likely to choose a STEM degree at the university. 

Table 3 shows the coefficients obtained from estimating equation 3 on other outcomes at age 
14 (KS4), age 16 (KS5) and university. The top panel shows results on KS4 grades and on the 
number of exams taken in KS4 and KS5. Since triple science is more difficult, taking it reduces 
the average science grade at KS4. Columns 2 and 3 show that there are not spillovers on other 
subjects’ grades. Columns 4 and 5 investigate whether the total number of qualifications taken 
at age 14 and 16 changes, as a consequence of the new course offered. The results show that 
the number of exams taken at age 14 slightly increases. 

The second panel refers to outcomes at age 18, the results of KS5 exams. Column 1 shows that 
the policy does not have any effect on the probability of continuing to study at age 16, probably 
because the instrument mainly affects high ability students, who would continue to study in any 
case. Since a change in the probability of enrolling in science subjects at age 16 may be driven 
both by a change in the likelihood of continuing to study after age 16 and by a change in the 
likelihood of choosing science subjects - conditional on continuing-, column 1 shows that the 
coefficient estimated on KS5 subjects comes entirely from an increase in the second component, 
because the first is not affected by the policy. The result displayed in column 2 shows that 
the effect of studying triple science is not limited to the pure natural science subjects but it 
has spillovers on math, for instance. The third panel refers to university outcomes. Column 1 
shows again that the policy does not have any effect on the probability of continuing to study 
at the university. 26 The other columns show the effect on choice of degree and on the quality 
of the institution attended. Students taking triple science are more likely to attend institutions 
belonging to the Russell group. 2 ' Moreover studying more science in secondary school also 
increases the probability of graduating on time in STEM degrees. 28 This is extremely relevant 
given the large debate that is taking place in many countries, the US in particular, about 

24 The dependent variables in all cases are dummies equal to one if students attend a certain course and equal 
to 0 if they do not attend those courses or do not continue studying. 

25 In particular, the share of girls attending school s in year t and the share of FSME (Free School Meal Eligible). 
In the spirit of Joensen and Nielsen [2009, 2016]. 

26 Note that even if the magnitude of the coefficient is similar to the other coefficients, the baseline in this case 
in much larger: the average is 36% in this case. 

27 Thc Russell group represents 24 leading UK universities in terms of research and teaching. 

28 The results on university outcomes are estimated on students taking the final KS4 exam in the years 2005-2007 
only, otherwise there is no information on whether the students graduated from university. 
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the low persistence of students in scientific fields [Arcidiacono et ah, 2016, Stinebrickner and 
Stinebrickner, 2014], 

Table 4 shows that the effect masks substantial gender heterogeneity 29 : while girls are 
affected by the policy- for instance they are induced to take more medicine or biological sciences- 
, the effect on pure STEM degrees is entirely driven by boys. Some studies claims girls may shy 
away from STEM degrees because of fair for competition or lack of confidence about their ability 
[Buser et ah, 2014, Niederle and Vesterlund, 2010], suggesting that increasing preparation and 
fostering scientific culture in secondary schools may shrink the gender gap in STEM degrees. 
My results suggest instead that strengthening the science curriculum at age 14 is not helpful. 
It may increase the share of girls taking science at age 14 and age 16, but it does not affect the 
share of girls choosing STEM subjects at the university. This is in line with the findings of some 
recent studies [Gemici and Wiswall, 2014, Zafar, 2013] showing that differences in preferences 
are the main driver behind the gender gap in college degrees; and preferences are difficult to be 
shaped by secondary school courses. My results are complementary to what is found in Joensen 
and Nielsen [2016] for Denmark. Joensen and Nielsen [2016] estimate very positive effects both 
for boys and for girls on the probability of choosing technical subjects at the university for 
students taking advanced math in secondary school. A first reason behind the difference in our 
results may be that they find a rather large effect on the probability of attending university 
as well, given their instrument affects slightly lower ability students than in this case. Their 
effect may therefore be the combination of changes in the pool of students attending university 
and changes in the willingness to choose STEM subjects, conditional on going to university; 
my effect instead comes exclusively from the second component. A second reason is related to 
differences in the type of compliers. As also pointed out by Joensen and Nielsen [2016] and 
extensively addressed for the regressions on earnings, the policy they analyze affects girls much 
more than boys and compilers for the two groups of students are likely to be very different. 
This makes the coefficients of the IV diffucult to compare across genders. As I will address 
more extensively in Subsection 4.2, my instrument affects boys and girls in a very similar way. 

Tables A4 and A5 explore moreover the extent and the presence of subjects complementarity 
and substitutability. If one takes more science at age 14, which other (complement) subjects 
is she more likely to take and, more importantly, from which (substitute) subjects does she 
opt out? Table A3 in the Appendix shows the coefficients and standard errors obtained from 
estimating equation 3 using each time a different KS4 subject as dependent variable. Tables 
A4 and A5 report the same type of estimates but with respect to KS5 subjects and university 
degrees, respectively. Students who take triple science at KS4 tend to drop more vocational 
subjects, some foreign languages like German and some other core subjects like history. In 
terms of KS5 courses, taking triple science induces students to choose more natural science 
subjects and math later on, and to drop more vocational subjects, like media and accounting. 
Finally, triple science increases the probability of choosing scientific subjects at the university, 
like physics, engineering and medicine, but also non scientific but more challenging subjects, like 
classical languages. It decreases, instead, the probability of enrolling in law and architecture. 
The effect are different for boys and girls, especially for what concerns university degrees. 

It is difficult to draw general conclusions from the coefficients of Tables A3, A4 and A5: 
anecdotal evidence may suggest that a vocational course in music is very different from an 
advanced course in science at age 14, but to evaluate each subject according to some objective 

29 As shown in Table At of the Appendix, there are other interesting source of heterogeneity. The group mostly 
affected by the policy are the middle-high ability students. The very high ability students would probably be 
very well prepared in any case and are less likely to be at the margin, the low ability students are instead less 
likely to be affected by the policy at all. Moreover the effect on science at age 16 is slightly stronger for low SES 
students, the effect on university outcomes is instead more difficult to estimate with enough precision for low 
SES students because of the small share (20%) of low SES students attending university. 
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criteria, Table 5 uses a more formal procedure. I define courses along two dimensions: (i) ‘high 
achievers’ courses, characterized by a high average primary school grade of students choosing 
them in out-of-sample academic years; (ii) ‘female dominated’ courses, characterized by a high 
share of girls attending the courses in out-of-sample academic years (2002-2005). Figure 4 
describes each subject, along these dimensions. In particular it shows three scatterplots where 
for each course is displayed on the x-axis the share of girls usually enrolled in it and on the 
y-axis the average primary school grade of student attending it. Triple science stands out as 
the course at KS4 that is attended by the best students, followed by foreign languages, history 
and geography. With respect to KS5 options, math is the most challenging course, followed 
by physics, chemistry and foreign languages. For university degrees, medicine, languages and 
STEM subjects are attended by very good students while education, subjects allied to medicine 
and art are attended by the worst students on average. The correlation between the ability of 
students usually attending each course and the share of girls enrolled in those courses is negative. 
This is surprising, given that on average girls have higher grades than boys in primary school. 
Table 5 shows whether students start choosing more ‘high achievers’ courses at age 18 (KS5) and 
at the university as a consequence of taking more science at KS4. 30 Taking advanced science 
at age 14 induces students to choose more challenging subjects later on. Students taking triple 
science are induced to choose at age 16 courses usually attended by students whose average 
grade in primary school is about 0.2 standard deviations higher. The same is true for university 
degrees, but the magnitude of the effect is smaller. Moreover, for KS5, I disentangle how 
much of the reported increase is automatically due to the higher probability of choosing natural 
science subjects and how much to the fact that students choose other (complement) more ‘high 
achievers’ subjects, different from the three natural sciences. I find that the increase is partly 
driven by an higher probability of choosing science courses (63%) and partly due to a higher 
willingness to enroll in other difficult subjects not strictly in the natural science field (37%). 31 

The other columns look at the sample of boys and girls separately. The first row shows 
that girls who take triple science are induced to choose more challenging subjects (i.e. more 
‘high achievers’ subjects) in about the same proportion as boys, the second row shows that 
they still opt for female-dominated subjects (like medicine for instance). This is an interesting 
result: while at age 16 girls taking triple science still opt for more male-dominated subjects 
(physics or math for instance - even if to a lower extent than for boys), strengthening the 
science preparation in secondary school does not have any effect on the likelihood that girls opt 
for STEM (male-dominated) subjects at the university. This suggests that once the subject 
choice is actually related to the characteristics of their future jobs, girls still prefer the most 
female-dominated degrees. 

4.2 Compilers’ characterization 

This Section analyses who decides to take triple science, when the school offers it. This helps 
understand how students make decisions about which subject to take at age 14 and whether 
the heterogeneity in the /% coefficient, especially along the gender dimension, is actually driven 
by differences in the treatment effect or by differences in compliance across genders. Even if 
teachers in England usually make recommendations about which field courses to choose, the 
actual choice of whether to take triple science or not is a free decision made by students. 32 

30 To obtain these results I multiply the coefficients displayed in Tables A3, A4 and A5 by the numbers displayed 
in Figure 4 and I sum the series. Standard errors are computed through the Delta method. 

31 This result is available upon request. 

32 0ne caveat should be considered when interpreting the results: sometimes supply of triple science is con- 
strained since classes in England cannot be larger than 30. Since schools mainly prioritize based on previous 
science and math scores, any differences in the probability of taking triple science based on previous test scores 
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Pupils will choose to take triple science if their expected utility when D = 1 is higher 
than their expected utility when D = 0. This may happen because triple science reduces their 
costs (or their perception of the cost) of graduating in certain degrees or of graduating at all or 
because triple science directly increases their productivity, and therefore wage. The contribution 
in terms of utility of taking triple science with respect to the second best option, will not be 
the same for all students: those already very good in science or with very strong preferences 
towards other subjects may not find it as beneficial to take triple science. 33 This means that 
the likelihood of taking triple science will not be the same for everybody: it will depend on 
preferences, on innate ability and on perceptions towards their ability. 

The first row of Table 6 shows results from the first stage regression. Being unexpectedly 
exposed to the offer of taking triple science increases students’ probability of enrolling in it by 
15 percentage points. The F statistics is around 2800. 

Table 6 then characterizes compliers for the entire population and for boys and girls separately 
(columns 2 and 3, respectively). I obtain information on compliers’ characteristics looking at 
the first stage for several subgroups of the population. For instance the ratio between the 
instrument’s coefficient of the first stage estimated on the sample of females only (0.149) and 
the coefficient of the first stage estimated on the entire sample (0.163) represents the relative 
likelihood that a compiler is female. 34 The Table shows that compliers are more likely to 
be very good students in primary school: the relative likelihood a compiler is in the top 20th 
percentile of test scores in primary school is more than two. Moreover compliers tend to be high 
income students and, interestingly, there does not seem to be any particular gender difference 
in compliance. The second and the third columns compare compliers for the subgroups of girls 
and boys respectively and show that compilers’ characteristics are very similar between these 
two groups. 

4.3 Checks to the identification strategy 

As stated in Section 3, the instrument used in the analysis relies on some assumptions. 

First, the assumption that the information set of both the treatment and the control groups 
of students at age 11 is the same and does not include the information on whether the schools 
not offering triple science when students apply are going to offer it in three years. To check 
this assumption I include all schools in the sample (both offering and not offering triple science 
when student i applies) and I estimate the following equation: 

Wist — Ot\Z s £ ~\~ Ot 2 Z s t + CX^Xist + + TJist (5) 

where Wi s t are several outcomes (like the dummy for whether student i chooses a STEM 
degree or whether he graduates in it) or pre-determined characteristics (like the average science 
grade in secondary school, his gender etc); z]j is a dummy equal to 1 if school s attended by 
student i in cohort t offered triple science when the student was 11 and chose her secondary 
school and z s t is my usual instrumental variable. In this way I test the extreme assumption 
that, even when parents or students know the school is offering triple science when applying, 
they do not select schools accordingly. Table 7 shows the results with (panel 1) and without 
(panel 2) school specific trends. The coefficient a\ is not significant for most variables and in 
any case is usually extremely small: students applying to schools already offering triple science 

may not be driven by students’ willingness to take triple science, but by schools admission rules. 

33 Unless triple science has a positive effect also in reducing the cost of taking exams in other subjects, for 
instance through changes in self confidence. 

34 First stages in this case do not include any control a part from year and school fixed effects. This does not 
affect the effect of interest because controls are not correlated with the instrument. 
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or not offering it appear very similar- at least in terms of observable characteristics. This is 
consistent with the notion that students cannot freely choose their schools because schools, 
when oversubscribed, have to select students based on geographical distance. 

Second, the assumption that schools decide when to start offering triple science not based 
on the quality of the current cohort attending the school and not because the school is already 
on an increasing trend. Table 8 provides evidence that, when using my identification strategy, 
the timing of the introduction of the triple science option is not correlated with (observable) 
characteristics of current students in the school. The Table runs a set of placebo tests, where I 
estimate the reduced form of equation 3 (without controlling for and where the dependent 
variable is a pre-determined characteristic, the grade in the science course in primary school. 
The triple science dummy ( TS ) in this case should not be significant, because the instrument 
should not be correlated with the grade at KS2, unless my specification does not take full care of 
selection. The Table has the same structure of Table 2 and it shows how different identification 
strategies may fail to address selection. Column 1 shows results from a simple OLS regression, 
column 2 adds school fixed effects, column 3 replicates the specification used by Altonji [1995] 
and uses as instrument the share of students taking triple science in school s and year t, Column 
4 uses my instrument but does not include schools fixed effects. 35 Column 5 includes also school 
fixed effects. Reassuringly, the effect in this case is 0. Finally column 6 adds school specific 
time trends, and the coefficient is again 0. Table A2 in the Appendix shows results from a set 
of other balancing tests obtained estimating the same specifications as in columns 5 and 6 for 
a bunch of other predetermined observable characteristics. All balancing tests show that the 
treatment is not correlated with observable characteristics of the current students in the school. 

Moreover, I check for the presence of parallel trends. In particular, I check whether, before 
school s started to offer triple science, the trend was parallel to that of all other schools still 
not offering triple science. I augment my reduced form regression with leads and lags of the 
instrument (following Autor [2003]): 

m q 

Hist = ~y ^ 7 T—t z s(r—t) 4" 'y ] 7 T+t z s(r+t) T Ct T Cs T Uist (6) 

t = 0 t = 0 

where z s t is my instrument, r is the year school s starts offering triple science, Cs and Ct are the 
usual school and year fixed effects and ui s t is the error term. I then check for the presence of 
parallel pre-treatment trends by evaluating whether all coefficients 7 r _* are close to 0, for every 
r. Figure 5 shows that the trends are parallel before the introduction of the advanced science 
course and there is a jump in the outcomes and in the treatment correspondingly exactly to the 
year of the introduction of the new course. 36 This confirms the results obtained in Table 7 and 
8 . 

Another possible concern is that, once a school sets up all arrangements in terms of teaching 
qualifications and staff in order to offer triple science, it may start to offer more science courses 
at KS5 as well. In England about 60% of the schools offer both KS4 (age 14) and KS5 (age 16) 
exams. This would imply that part of the effect I find may be purely mechanical: students take 
more KS5 science courses because the set of options changes also at KS5. I address this concern 
in Table 9. Columns 1 and 2 look at how the probability of offering science at KS5 evolves over 
time and whether it corresponds exactly to the cohort when the school starts offering triple 
science at KS4. The correlation is 0. Columns 3 and 4 look at whether the effect of studying 
triple science on the probability of choosing science at KS5 is larger for schools offering both 

35 This column partly replicates, even if in a very different context, Joensen and Nielsen [2016] 

3<J I also estimated the same graphs but using predetermined characteristics as dependent variables: in this case 
there is no jump at year 0, nor at year -3, that correspond to the time when students know, when applying, that 
the school offers triple science. These results are available upon request. 
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KS4 and KS5 courses than for schools offering KS4 courses only. The effect is identical. If part 
of the effect I find in my results was mechanical, it would be stronger for schools offering both 
KS4 and KS5 exams. 

Moreover, one may worry that taking triple science could potentially directly affect the 
possibility of being admitted to STEM degrees at the university. However, while universities 
often require some KS5 subjects in order to admit students to certain degrees, in no case they 
require specific KS4 subjects. For instance, in 2013, a KS5 exam in math was required in 13% 
of the cases (i.e. of degree-university combinations) and at least one KS5 exam in science was 
required in 12% of the cases. In no case 37 , in 2013, there was a specific requirement for age 14 
(KS4) subjects. 

Finally, it may be that the simple fact of having the possibility of being enrolled in advanced 
science but having been excluded, for example because the class was oversubscribed and schools 
had to select students, may generate a direct effect on some students and may therefore violate 
the exclusion restriction assumption. This is impossible to test. Table A6 however exploits 
some of the institutional features of English school system to evaluate how problematic this 
may be. Figure 6 plots the distribution of the size of triple science courses in each school. From 
the Figure it is clear that class size bunches at multiples of 30. There is a discontinuity both 
corresponding to 30 students and corresponding to 60 students. Since class size in England is 
required to be lower than 30, this Figure suggests that in some cases the triple science course 
was oversubscribed, and schools had to select students. Unfortunately the exact admission rule 
is different for each school and is not publicly available. Table A6 exploits this feature of the 
system and runs the main specification (using equation 3) on the sample of schools where the 
triple science course was very likely not oversubscribed, because the number of enrolled students 
was not close to the maximum. 38 The results of this exercise are very similar to the main ones. 

4.4 Second instrument 

Table 10 shows the results obtained from my second identification strategy. 39 The first three 
columns refer to the probability of choosing a natural science subjects at Key Stage 5 (age 18), 
the last three columns refer to the probability of attending a STEM degree at the university. 40 
The first and the forth columns do not include neighbourhood fixed effects, but control for the 
lagged value of my instrument: they compare neighbourhoods which had the same share of 
reachable schools offering the triple science course the previous year and they exploit variation 
between t and t — 1. All other columns include neighbourhood fixed effects. 

This instrument compares students living in the same neighbourhood but attending different 
schools which offer or do not offer triple science. However, the probability of offering triple 
science is likely to be related to other school characteristics, like school quality, that may 
directly affect the choices of degrees at the university. Since the variation in school quality may 
be much larger when using across school rather than within school over time variation, like 
with the previous instrument, in Columns 3 and 6 I include the average quality of the set of 
reachable schools in year t as a control. I proxy school quality using the school value added in 
the out of sample years (2002-2005). 

31 Data are taken from http://www.thecompleteuniversityguide.co.uk/courses/search 

38 Those schools where the number of students enrolled in the triple science classes was not between 28 and 32 
or between 58 and 62. 

39 Since there is no information on postcode in primary school for students who finished secondary school in the 
years before 2007, this section only refers to the years 2007-2010. For these cohorts, however, I have information 
on whether they graduated only for the students who took KS4 exams in the year 2007, so I only analyze effects 
on enrollment and on KS5 outcomes. 

40 The effect on the probability of attending university is 0, as for the previous instrument. 
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The results confirm the robustness of the first identification strategy: the estimated effects 
are positive and significant and the effects on STEM dergrees are stronger for boys than for 
girls 41 . The estimates obtained through this strategy are however slightly larger, this may be 
related to the different type of variation, and therefore of compliers, exploited. While compilers 
for the first instrument are all individuals who take triple science because their school unexpect- 
edly starts to offer it, which also include very good students who happened to be enrolled in 
a school not offering triple science; compliers in the second instruments are students who take 
triple science because, thanks to a larger supply of triple science in the set of reachable schools 
in their neighbourhood, they manage to enroll in a school offering it. In this second case, very 
good students would probably have enrolled in a school offering triple science in any case. This 
suggests compliers for the second strategy exclude the extremely high ability students. Since, 
as shown in Table A1 in the Appendix, those mostly affected by the policy are middle-high 
ability students, this may explain the larger effect found in Table 10. 

5 Alternative Mechanisms 

This Section explores the mechanisms that may generate the effect found in Section 4 and 
explores whether the effect obtained is actually generated by changes in curriculum or, since 
the treatment has multiple components, it is also driven by changes in the peer composition of 
the courses attended or in the type of teachers in the school. 

5.1 Peers 

First, I analyse the peers channel. In particular, I use the following measure of peer quality in 
science (Qist) for student i, attending school s in year t who takes science courses Dj st : 

Qist = X?_ i)st (7) 

where X^_^ st is the average science grade in primary school of students taking age 14 science 
course D 42 , in school s in year t (excluding i). 

The first panel of Figure 7 shows how peers’ composition in the science course taken at age 
14 changes for schools offering triple science or not. The dashed line plots the density of Qi S t in 
the age 14 science course for students attending schools not offering triple science. The solid line 
refers instead to schools offering triple science. The figure shows that when schools offer triple 
science there is a concentration of very high ability students able to attend the science class 
with peers of much higher quality than before. Column 1 of Table 11 confirms this finding: it 
shows how peers’ quality in science courses changes after the school starts offering the advanced 
science course, depending on students’ primary school grade in science. The quality of peers in 
the science class decreases for lower ability students and increases quite extensively for higher 
ability students. 

To control for this dimension and check whether the effect found in Table 3 comes mostly 
from changes in the peer composition or from changes in the curriculum, I control for peer 
quality in equation 3. Since students self-select into different types of science course at age 
14, peers’ quality may be endogenous. I therefore instrument peer quality by using within- 
school over-time changes in peers’ composition (following Hoxby [2000]). In particular, I use 
the fact that classes in England cannot be larger than 30 (as shown in Figure 6). 43 I therefore 

41 results available upon request 

42 Since there is no information about the exact class but only about the type of science course, I use the average 
grade in primary school of students taking the same course. 

43 While for primary schools this requirement is compulsory, it is just recommended for secondary school. 
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predict, based on predetermined characteristics like previous test scores and demographics, 44 
the probability of being enrolled into triple science and I take the average science grade in 
primary school of the 30 or 60 students (depending on the number of triple science classes 
offered) with the highest probability of being enrolled into triple science. I then exploit within 
school over time variation in the average quality of these students and of all other students in 
school s and year t, allowing the effect to be different depending on whether the school offers 
(unexpectedly) triple science or not. My first stage equation is: 


Qist = OiZst + e 2 Q%^ “j + W^l-i ) + 04 QJi-l) * -St + dsQfffi * Zgt + e 5 x ist + e s + 9 t + Vist (8) 
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where z s t is the first instrument - the dummy equal to 1 student i was unexpectedly exposed 
to the option of choosing triple science- is the average science grade in primary school 

of the 30 (or 60) students with the highest predicted probability of being enrolled in triple 
science and is the average science grade in primary school of all other students; 9 S and 

9 t are school and year fixed effects and r]i S tj is the error term. Panel b of Figure 7 shows how 
the instrument works. The solid line refers to the average science grade in primary school for 
students predicted to attend the triple science class, the dashed line refers to all other students. 

Table 11 displays the results. Columns 2 to 6 show that the effect of triple science is very 
similar to what found before, even after controlling for changes in peers’ quality. The joint F 
statistic is 35. 


5.2 Teachers 

Unfortunately, it is not possible in England to link data on individual teachers to administrative 
data on individual students. In this section I use the yearly number of teachers and of qualified 
teachers in each school. Table A7 in the Appendix shows that neither the overall number of 
teachers nor the number of qualified teachers in a school change significantly once the school 
introduces the triple science option. This suggests that teachers’ quality and quantity do not 
increase as a result of the introduction of the advanced science course. 


6 Conclusions 

This paper uses a reform that increased the probability of taking an advanced science course in 
English secondary schools for students at the top of the ability distribution to analyze whether 
secondary school curriculum affects post-16 outcomes, and in particular the probability of en- 
rolling and graduating in a STEM degree. Moreover, by separately investigating the effect on 
boys and girls, this paper seeks to understand whether strengthening school preparation in 
science shrinks the gender gap in enrollment in STEM degrees. 

Since the policy I consider affected very high ability students, who would have continued 
studying in any case, I find that a stronger science curriculum in secondary school has no effect 
on university enrollment. Still, my estimates suggest that offering more science in secondary 
school improves educational outcomes in many domains. It induces students to attend higher 
quality universities and significantly increases the probability of enrolling and, very importantly, 
of graduating from university with a STEM degree. This effect masks a substantial and inter- 
esting gender heterogeneity: at age 14 when exposed to the option of studying more science 

44 In particular, KS2 and KS3 science grades (both teacher assessed and from standardized exams) , gender, 
Free School Meal Eligibility. 
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in secondary school, there is no gender difference in the take-up probability. However, the dif- 
ference arises later on, at the university, when subject choices are likely to be correlated with 
occupations and jobs: both boys and girls are induced to take more challenging courses on aver- 
age, but girls still choose more female-dominated subjects like medicine, instead of engineering 
and math. This seems to be in line with the recent literature relating preferences towards job 
attributes to choices of university degrees [Wiswall and Zafar, 2016, Reuben et al., 2015, Zafar, 
2013] that shows that job characteristics play an important role in the choice of subjects at the 
university, with women and men displaying very different preferences, even if at the very top of 
the ability distribution. 

My findings show that there is a certain degree of persistence between what is studied at 
secondary school and what is studied at the university. An optimal design of the secondary 
school curricula may be useful to improve the match between supply and demand of specific 
skills. 
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Figures 


Figure 1: Take up in triple science 



ks4 acc year 



% school offering TS 
% high ability stud in TS 


% low ability student in TS 


Source: NPD dataset. The bars represent the share of schools offering triple science; the red dots 
represent the share of high ability (based on English, math and science primary school grade, top 40 
%) students taking triple science and the blue dots show the share of low ability (based on primary 
school grades, bottom 60 %) students taking triple science, by year. 


Figure 2: Timeline of the English educational system 
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Figure 3: Second instrument 
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Figure 4: Subject descriptives 
KS4 courses (age 14) 


CO 


OJ 
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% female (std) 


KS5 courses (age 16) 



% female (std) 


University courses (age 18) 



% female (std) 

Source: NPD dataset. Subjects are described along two dimensions: the average primary school 
grade (in English, math and science) of students taking the course in out of sample years and the 
share of girls taking the course in out of sample y2Srs. The circles around each observation represent 
the number of students attending these courses. 


Figure 5: Parallel Trends: Leads and Lags of the instrument 
Dep var: l=Triple Science 



lags 

Dep var: 1=A lev Science 



lags 


Dep var: l=Uni STEM 



lags 

Source: NPD dataset. The continuous line represent coefficients, the dashed lines the 5% confidence 
intervals, obtained from estimating equation 6. Omitted category: one year before the treatment. 
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Figure 6: Class size and number of students in triple science 



Source: NPD dataset. The dots are the number of schools, by triple science class size . 
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Figure 7: Peers 


Actual peers’ quality 



x 


no offer TS offer TS 


Instrument 



Source: NPD dataset. The first panel plots the distribution of science peers’ quality, distinguishing 
whether the school offers triple science or not. The second panel plots the average peers quality for 
students predicted to take the TS class and students not predicted to take the TS class. 
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Tables 


Table 1: Summary statistics 


Variable 


Mean 

Std. Dev. 


Key Stage 4 


offer TS (unexpected) 

0.196 

0.397 

l=Triple Sci 


0.076 

0.264 

l=Double Sci 


0.764 

0.425 

l=Single Sci 


0.163 

0.369 


Key Stage 5 


1=KS5 science 

(if KS5) 

0.198 

0.282 

1=KS5 math (if KS5) 

0.142 

0.252 


University 


l=uni 


0.348 

0.470 

l=STEM a 


0.126 

0.198 

l=Russell 


0.046 

0.211 

l=graduate a 


0.481 

0.361 


Demographics 


l=female 


0.497 

0.500 

1=FSM eligible 6 

0.144 

0.356 


The summary statistics reported in the Table refer 
to the entire sample of students taking their final 
KS4 exams (at age 16) between 2005 and 2010. 

“ Conditional on going to university. 
b Free School Meal Eligible. 
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Table 2: Results for science at age 17 and 19 



OLS 

OLS-Fe 

Altonji 

IV 

IV-Fe 

IV-Fe tr 

IV-Fe 


[1] 

[2] 

[3]' 

[4] 

[5] 

[6] 

[7] 

Dep var: 



1: 

=KS5 Science 



1=TS 

0.334*** 

0.257*** 

0.147*** 

0.072*** 

0.051*** 

0.048*** 

0.054*** 


(0.005) 

(0.005) 

(0.014) 

(0.010) 

(0.006) 

(0.008) 

(0.006) 

l=female 


-0.009*** 

-0.004*** 

-0.011*** 

-0.010*** 

-0.010*** 




(0.001) 

(0.001) 

(0.001) 

(0.001) 

(0.001) 


I sch gr sci 


0.020*** 

0.019*** 

0.019*** 

0.021*** 

0.022*** 




(0.000) 

(0.001) 

(0.001) 

(0.001) 

(0.001) 


N 

1690451 

1690451 

1690451 

1690451 

1690451 

1690451 

1690451 

Fstat 



559372 

2234 

2065 

1742 

2066 

Dep var: 



1=STEM university 



1=TS 

0.104*** 

0.072*** 

0.039*** 

0.024*** 

0.014*** 

0.012** 

0.015*** 


(0.002) 

(0.002) 

(0.005) 

(0.004) 

(0.004) 

(0.006) 

(0.004) 

l=female 


-0.034*** 

-0.034*** 

-0.035*** 

-0.034*** 

-0.034*** 




(0.001) 

(0.001) 

(0.001) 

(0.001) 

(0.001) 


I sch gr sci 


0.005*** 

0.005*** 

0.005*** 

0.006*** 

0.006*** 




(0.000) 

(0.000) 

(0.000) 

(0.000) 

(0.000) 


N 

1690451 

1690451 

1690451 

1690451 

1690451 

1690451 

1690451 

Fstat 



559372 

2234 

2065 

1742 

2066 

School Fe 

No 

Yes 

No 

No 

Yes 

Yes 

Yes 

School trends 

No 

No 

No 

No 

No 

Yes 

No 

School contr 

No 

No 

Yes 

Yes 

No 

No 

No 

Stud contr 

No 

Yes 

Yes 

Yes 

Yes 

Yes 

No 


Additional controls: year and school fixed effects; student controls: gender, Free School Meal Eligible, Special 
Education Needs, primary school grade in science, math and english; schools controls: school size. All 
dependent variables are set equal to 0 if students do not continue studying or if they do not take the considered 
subjects. Robust standard errors clustered by school in parentheses. * denotes significance at 10%, ** denotes 
significance at 5%, *** denotes significance at 1%. 
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Table 3: Results for other outcomes 


[1] 

[2] 

[3] 

[4] 

[5] 

Panel 1: KS4 (age 14) outcomes 





Grades 


N. 

Exams 

Dep var: KS4 Eng gr a 

KS4 Math gr a 

Ks4 science gr 

n exams ks4 

n exams ks5 c 

1=TS 0.001 

-0.026 

-0.065** 

0.438** 

-0.021 

(0.031) 

(0.028) 

(0.027) 

(0.210) 

(0.022) 

N 1332413 

1339792 

1690325 

1690451 

860615 

ymean 0.022 

0.021 

0.000 

10.303 

3.416 

Panel 2: KS5 (age 16) outcomes 




Dep var: 1=KS 5 

1=KS5 math 

1=KS5 Bio 

1=KS5 Che 

1=KS5 Phy 

1=TS -0.009 

0.035*** 

0.037*** 

0.025*** 

0.024*** 

(0.010) 

(0.005) 

(0.004) 

(0.003) 

(0.005) 

N 1690451 

1690451 

1690451 

1690451 

1690451 

ymean 0.509 

0.056 

0.040 

0.026 

0.065 

Panel 3: University outcomes b 




Dep var: l=uni 

l=grad 

l=Russell 

l=uni med 

l=grad STEM 

1=TS 0.044* 

0.041 

0.022* 

0.013** 

0.033*** 

(0.025) 

(0.025) 

(0.011) 

(0.007) 

(0.011) 

N 966777 

966777 

966777 

966777 

966777 

ymean 0.318 

0.207 

0.046 

0.019 

0.034 


Additional controls: year and school fixed effects; student controls: gender, Free School Meal Eligible, 
Special Education Needs, primary school grade in science, math and english; schools controls: school 
size. All dependent variables are set equal to 0 if students do not continue studying or if they do not 
take the considered subjects. Robust standard errors clustered by school in parentheses. * denotes 
significance at 10%, ** denotes significance at 5%, *** denotes significance at 1%. 

“ Grades go from 0 to 7, but are standardized to have mean 0 and standard deviation 1. 
b The results on university outcomes use only the 2005-2008 sample because otherwise there will be no 
information on the graduation outcomes. 


Table 4: Gender Heterogeneity 


Dep var: 

1=KS5 sci 
[1] 

l=Russell 

[2] 

1=STEM 

[3] 

l=medicine 

[4] 

l=grad 

[5] 

l=grad STEM 
[6] 

1=TS 

0.047*** 

0.027 

0.003 

Girls 

0.023** 

0.049 

0.015 


(0.008) 

(0.021) 

(0.015) 

(0.009) 

(0.040) 

(0.013) 

N 

849149 

486068 

486068 

486068 

486068 

486068 

ymean 

0.080 

0.053 

0.020 

0.030 

0.239 

0.019 

1=TS 

0.053*** 

0.018 

0.037** 

Boys 

0.005 

0.033 

0.045*** 


(0.007) 

(0.013) 

(0.017) 

(0.006) 

(0.029) 

(0.016) 

N 

841234 

480646 

480646 

480646 

480646 

480646 

ymean 

0.088 

0.040 

0.054 

0.008 

0.174 

0.049 


Additional controls: year and school fixed effects; student controls: gender, Free School Meal Eligible, 
Special Education Needs, primary school grade in science, math and English; schools controls: school 
size. All dependent variables are set equal to 0 if students do not continue studying or if they do not 
take the considered subjects. Robust standard errors clustered by school in parentheses. * denotes 
significance at 10%, ** denotes significance at 5%, *** denotes significance at 1%. 
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Table 5: Summarizing effects on other subjects 




A ks5 courses 


A 

uni major 


All 

Girls 

Boys 

All 

Girls 

Boys 

High achievers 

0.197*** 

0.168*** 

0.220*** 

0.022*** 

0.021** 

0.028*** 


(0.019) 

(0.028) 

(0.023) 

(0.007) 

(0.011) 

(0.008) 

Female-dominated 

-0.042*** 

-0.016 

-0.058*** 

-0.007 

0.014 

-0.023** 


(0.018) 

(0.027) 

(0.020) 

(0.008) 

(0.011) 

(0.010) 


The coefficients are computed as JA /3jqj where j indicates subjects, /3 ? is the subject specific 
coefficient estimated in Tables A4 and A5 and qj is either ‘high achievers’ (the average primary 
school grade of taking the course j in out of sample academic years (2002-2005), standardized to 
have mean 0 and standard deviation 1) or ‘female dominated’ (the share of girls attending course j 
in out of sample academic years). Standard errors are computed through the delta method. 
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Table 6: Characterizing compilers 


Sample 

Everybody 

[1] 

Only Girls 
[2] 

Only Boys 

[3] 

Panel 1: Entire Sample 



1st 

0.175*** 

0.161*** 

0.188*** 


(0.004) 

(0.005) 

(0.005) 

N 

1690451 

849184 

841267 

Panel 2: Quintiles 

science grade in primary school 


subgroup: 

1st quintile av. 

primary school grade 

1st 

0.009*** 

0.008*** 

0.009*** 


(0.001) 

(0.001) 

(0.001) 

N 

339951 

174093 

165858 

Ratio wrt tot FS 

0.051 

0.050 

0.048 


subgroup: 

2nd quintile av. 

primary school grade 

1st 

0.038*** 

0.035*** 

0.041*** 


(0.001) 

(0.002) 

(0.002) 

N 

341063 

171845 

169218 

Ratio wrt tot FS 

0.217 

0.217 

0.218 


subgroup: 

3rd quintile av. 

primary school grade 

1st 

0.099*** 

0.092*** 

0.105*** 


(0.003) 

(0.003) 

(0.004) 

N 

336767 

168450 

168317 

Ratio wrt tot FS 

0.566 

0.571 

0.559 


subgroup: 

4th quintile av. 

primary school grade 

1st 

0.222*** 

0.208*** 

0.234*** 


(0.005) 

(0.006) 

(0.006) 

N 

344551 

171725 

172826 

Ratio wrt tot FS 

1.269 

1.292 

1.245 


subgroup: 

5th quintile av. 

primary school grade 

1st 

0.449*** 

0.417*** 

0.479*** 


(0.009) 

(0.011) 

(0.010) 

N 

328119 

163071 

165048 

Ratio wrt tot FS 

2.566 

2.590 

2.548 

Panel 3: Socio-Economic Status 



subgroup: Low SES students (yes FSM a ) 

1st 

0.084*** 

0.077*** 

0.092*** 


(0.002) 

(0.003) 

(0.003) 

N 

223375 

114446 

108929 

Ratio wrt tot FS 

0.480 

0.478 

0.489 


The Table reports results from the first stage for different subgroups of the 
population. Dependent variable: a dummy equal to 1 if the student takes 
triple science. Additional controls: year and school fixed effects. Robust 
standard errors clustered by school in parentheses. * denotes significance at 
10%, ** denotes significance at 5%, *** denotes significance at 1%. 

Free School Meal Eligible. 
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Table 7: Selection 



av KS2 gr a 

sci KS2 gr ft 

1=FSM 

1=KS5 sci 

l=uni 

1=STEM 

l=grad STEM 


[1] 

[2] 

[3] 

[4] 

[5] 

[6] 

[7] 

Without school specific trends 







y 11 

-0.005 

-0.008 

0.002 

0.005*** 

-0.002 

0.001 

0.001 


(0.005) 

(0.006) 

(0.002) 

(0.001) 

(0.004) 

(0.002) 

(0.002) 

N 

2882341 

2882341 

2882341 

2882341 

1468169 

1468169 

1468169 

School fe 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

School trend 

No 

No 

No 

No 

No 

No 

No 

With school specific trends 







7 li 

Ast 

0.002 

0.002 

0.007** 

0.004** 

-0.003 

0.001 

0.001 


(0.006) 

(0.002) 

(0.003) 

(0.002) 

(0.005) 

(0.002) 

(0.002) 

N 

2285735 

2285735 

2285735 

2285735 

1309004 

1309004 

1309004 

School fe 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

School trend 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 


Additional controls years dummies, school fixed effects. Robust standard errors clustered by school in parentheses. The 
dependent variables in column 4, 5 and 7 are set equal to 0 if students do not continue studying or if they do not take that 
subject. * denotes significance at 10%, ** denotes significance at 5%, *** denotes significance at 1%. 

“ average grade in English, math and science. 
b grade in science. 


Table 8: Balancing Test 



OLS 

OLS-Fe 

Altonji 

IV 

IV-Fe 

IV-Fe tr 


[1] 

[2] 

[3] 

[4] 

[5] 

[6] 

Dep var: 


1= Average Grade prim school 0 


1=TS 

0.927*** 

0.788*** 

0.802*** 

0.363*** 

0.042 

0.045 

mfemale 

(0.013) 

(0.015) 

(0.054) 

(0.052) 

0.232*** 

(0.026) 

(0.034) 

mfsm 




(0.053) 

-1.545*** 







(0.051) 



N 

1337202 

1337202 

1337202 

1337202 

1337202 

1337202 

School Fe 

No 

Yes 

No 

No 

Yes 

Yes 

School time trends 

No 

No 

No 

No 

No 

Yes 

Additional controls: 

years dummies. 

Robust standard errors 

clustered by school in parentheses. * 

denotes significance at 10%, ** denotes significance at 5%, *** denotes significance at 1%. 

“ Average grade in the KS4 exams in English, math and science. 


34 



Table 9: Robustness: offer KS5 Science 



Sch level regr (offer) 

Stud in schools wo sixth form 

Dep var: 

1= Offer KS5 

l=offer KS5 

All schools 

only offer KS4 


Science 

Math 

Dep var: 

1=KS5 Science 


[1] 

[2] 

[3] 

[4] 

offertripleO 

0.002 

- 0.000 



1=TS 

(0.004) 

(0.004) 

0.050*** 

0.053*** 

N 

5294 

5294 

(0.006) 

1690451 

(0.009) 

751721 

ymean 

0.477 

0.467 

0.084 

0.060 


Column 1 and 2 are run at the school-year level. Columns 3 and 4 are run at the 
student level. Additional controls: year and school fixed effects; student controls: 
gender, Free School Meal Eligible, Special Education Needs, primary school grade in 
science, math and English; schools controls: school size. The dependent variables in 
columns 3, and 4 are set equal to 0 if students do not continue studying or if they do 
not take the considered subjects. Robust standard errors clustered by school in 
parentheses. * denotes significance at 10%, ** denotes significance at 5%, *** denotes 
significance at 1%. 


Table 10: Identification based on the second instrument 


Dep. Var.: 

1 = 

=KS5 Science 

1= 

=STEM major 


[1] 

[2] 

[3] 

[4] 

[5] 

[6] 

1=TP 

0.111*** 

0.120*** 

0.108** 

0.028*** 

0.042 

0.035 


(0.013) 

(0.042) 

(0.043) 

(0.010) 

(0.054) 

(0.060) 

% reach school off TSt_i 

0.001 



-0.002 




(0.002) 



(0.002) 



av. qual reach school 



0.018** 



0.010 




(0.007) 



(0.007) 

N 

2847133 

2850675 

2850675 

2392486 

2395787 

2392319 

Neigh Fe 

No 

Yes 

Yes 

No 

Yes 

Yes 


Additional controls: year fixed effects; student controls: gender, Free School Meal Eligible, Special 


Education Needs, primary school grade in science, math and English. All dependent variables are set 
equal to 0 if students do not continue studying or if they do not take the considered subjects. Robust 
standard errors clustered by neighbourhood in parentheses. * denotes significance at 10%, ** denotes 
significance at 5%, *** denotes significance at 1%. 


35 



Table 11: Peers 


Dep var: 

Qist a 

1=KS5 sci 

l=Russell 

1=STEM 

l=medic 

l=grad 

l=grad STEM 


[i] 

[2] 

[3] 

[4] 

[5] 

[6] 

[7] 


Z offer*ks2 sci ql -0.095*** 

( 0 . 011 ) 

Z offer *ks2 sci q2 -0.060*** 
(0.008) 

Z offer *ks2 sci q3 -0.031*** 
(0.007) 

Z offer*ks2 sci q4 0.024*** 
(0.007) 

Z offer*ks2 sci q5 0.055*** 
(0.007) 

Z offer*ks2 sci q6 0.099*** 
(0.008) 


1=TS 

0.053*** 

0.022** 

0.024** 

0.013* 

0.042* 

0.034*** 


(0.006) 

(0.011) 

(0.012) 

(0.008) 

(0.025) 

(0.011) 

qual peer (std) 

0.021*** 

0.018*** 

0.003 

-0.001 

0.014 

0.004 


(0.005) 

(0.004) 

(0.004) 

(0.003) 

(0.009) 

(0.004) 

N 

1648926 1621765 

935630 

935630 

935630 

935630 

935630 


Additional controls: year and school fixed effects; student controls: gender, Free School Meal Eligible, Special Education 
Needs, primary school grade in science, math and english; schools controls: school size. All dependent variables are set 
equal to 0 if students do not continue studying or if they do not take the considered subjects. Gr sci refers to sixtiles of 
the grade distribution in the science exam at the end of primary school (KS2). F statistic: 35. 

“ quality (based on science grade in ks2 (age 11) of peers in the same science class. Robust standard errors clustered by 
school in parentheses. * denotes significance at 10%, ** denotes significance at 5%, *** denotes significance at 1%. 
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7 Appendix 


Table Al: Heterogeneity 


Dep var: 

1=KS5 sci 

l=Russell 

1=STEM 

l=medicine l=grad 

l=grad STEM 


[1] 

[2] 

[3] 

[4] 

[5] 

[6] 

Panel 1: 

Quintiles science grade in primary school 






3rd 

quintile 



1=TS 

0.019 

-0.002 

-0.002 

0.015 

0.036 

0.032 


(0.015) 

(0.035) 

(0.037) 

(0.028) 

(0.089) 

(0.036) 

N 

336723 

203148 

203148 

203148 

203148 

203148 

ymean 

0.045 

0.024 

0.026 

0.017 

0.188 

0.023 




4th 

quintile 



1=TS 

0.032*** 

0.041* 

0.076*** 

0.017 

0.084* 

0.086*** 


(0.010) 

(0.021) 

(0.021) 

(0.014) 

(0.046) 

(0.019) 

N 

344500 

197276 

197276 

197276 

197276 

197276 

ymean 

0.104 

0.053 

0.045 

0.024 

0.277 

0.042 




5th 

quintile 



1=TS 

0.053*** 

0.018 

0.010 

0.005 

0.016 

0.012 


(0.007) 

(0.016) 

(0.015) 

(0.008) 

(0.023) 

(0.015) 

N 

328076 

181689 

181689 

181689 

181689 

181689 

ymean 

0.254 

0.146 

0.097 

0.040 

0.414 

0.090 

Panel 2: 

Socio-Economics Status 








High SES students (no 

FSM) 


1=TS 

0.048*** 

0.024** 

0.020 

0.015* 

0.037 

0.033*** 


(0.006) 

(0.011) 

(0.013) 

(0.008) 

(0.026) 

(0.012) 

N 

1431595 

818880 

818880 

818880 

818880 

818880 

ymean 

0.093 

0.052 

0.041 

0.020 

0.226 

0.037 




Low SES students (yes 

FSM) 


1=TS 

0.063*** 

-0.008 

0.042 

-0.003 

0.100 

0.024 


(0.018) 

(0.044) 

(0.039) 

(0.035) 

(0.090) 

(0.036) 

N 

258804 

147854 

147854 

147854 

147854 

147854 

ymean 

0.034 

0.015 

0.018 

0.010 

0.103 

0.016 


Additional controls: year and school fixed effects; student controls: gender, Free School Meal Eligible, 
Special Education Needs, primary school grade in science, math and English; schools controls: school 
size. All dependent variables are set equal to 0 if students do not continue studying or if they do not 
take the considered subjects. Robust standard errors clustered by school in parentheses. * denotes 
significance at 10%, ** denotes significance at 5%, *** denotes significance at 1%. 
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Table A2: Other balancing tests 



RF 

[1] 

Du , , 

S “ 

IV 

[3] 

IV 

[4] 

Dep var: 

l=Grade English prim school 


- 0.000 

0.005 




(0.004) 

(0.005) 



1=TS 



-0.001 

-0.002 




(0.023) 

(0.023) 

N 

1690451 

1690451 

1690451 

1690451 

yrnean 

0.015 

0.015 

0.015 

0.015 

Dep var: 


l=female 


Z s t 

-0.002 

-0.001 




(0.001) 

(0.002) 



1=TS 



-0.009 

-0.009 




(0.009) 

(0.009) 

N 

1690451 

1690451 

1690451 

1690451 

yrnean 

0.502 

0.502 

0.502 

0.502 

Dep var: 


1= 

FSM 


Z s t 

- 0.000 

- 0.000 




(0.001) 

(0.002) 



1=TS 



-0.001 

-0.001 




(0.008) 

(0.008) 

N 

1690451 

1690451 

1690451 

1690451 

yrnean 

0.153 

0.153 

0.153 

0.153 

School Fe 

Yes 

Yes 

Yes 

Yes 

School trend 

No 

Yes 

No 

Yes 


Additional controls years dummies. All dependent variables are 
set equal to 0 if students do not continue studying or if they do 
not take that subject. Robust standard errors clustered by 


school in parentheses. * denotes significance at 10%, ** denotes 
significance at 5%, *** denotes significance at 1%. 
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Table A3: Effect on other KS4 subjects (age 14) 


Dep. var 

All 

[1] 

Coeff. 

[2] 

Se 

Girls 

[3] [4] 

Coeff. Se 

Boys 

[5] [6] 

Coeff. Se 

English lit 

0.068** 

(0.030) 

0.075** 

(0.030) 

0.061* 

(0.032) 

Statistics 

0.011 

(0.034) 

0.010 

(0.038) 

0.011 

(0.034) 

DT food 

-0.027* 

(0.016) 

-0.047** 

(0.024) 

-0.009 

(0.013) 

DT graphics 

-0.015 

(0.014) 

-0.002 

(0.017) 

-0.027 

(0.017) 

DT material 

-0.014 

(0.014) 

0.000 

(0.011) 

-0.024 

(0.022) 

Art design 

-0.008 

(0.019) 

0.001 

(0.025) 

-0.015 

(0.019) 

History 

-0.032* 

(0.019) 

-0.045* 

(0.023) 

-0.022 

(0.021) 

Geogr 

0.007 

(0.020) 

0.010 

(0.024) 

0.005 

(0.022) 

French 

-0.015 

(0.028) 

-0.010 

(0.033) 

-0.020 

(0.027) 

German 

-0.065*** 

(0.018) 

-0.072*** 

(0.022) 

-0.060*** 

(0.018) 

Business 

-0.012 

(0.019) 

-0.012 

(0.020) 

-0.014 

(0.021) 

Drama 

0.007 

(0.014) 

-0.001 

(0.020) 

0.013 

(0.014) 

Inf tech 

-0.034 

(0.031) 

-0.020 

(0.032) 

-0.048 

(0.035) 

Music 

-0.001 

(0.008) 

-0.012 

(0.011) 

0.009 

(0.010) 

Media 

-0.012 

(0.022) 

-0.016 

(0.025) 

-0.009 

(0.023) 

Fine art 

0.005 

(0.014) 

0.007 

(0.019) 

0.004 

(0.013) 

Office technology 

0.016 

(0.028) 

0.008 

(0.032) 

0.022 

(0.028) 

Applied buss 

-0.001 

(0.014) 

-0.004 

(0.015) 

0.000 

(0.015) 

Health care 

0.003 

(0.011) 

0.009 

(0.022) 

-0.002 

(0.004) 

Applied IT 

-0.009 

(0.021) 

-0.009 

(0.021) 

-0.008 

(0.024) 


Each line represents a different regression. Columns 1, 3 and 5 display the coefficients on the 


independent variable 1 = TS. All dependent variables are set equal to 0 if students do not take 
that subject. Usual controls. Robust standard errors clustered at the school level. * denotes 
significance at 10%, ** denotes significance at 5%, *** denotes significance at 1%. I exclude 
math and English because compulsory in KS4. 
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Table A4: Effect on other KS5 subjects (age 16) 


Dep. var 

All 

Coeff. 

Se 

Girls 

Coeff. Se 

Boys 

Coeff. ' Se 

Biology 

0.035*** 

(0.005) 

0.037*** 

(0.008) 

0.034*** 

(0.006) 

Chemistry 

0.037*** 

(0.004) 

0.032*** 

(0.006) 

0.040*** 

(0.005) 

Physics 

0.025*** 

(0.003) 

0.012*** 

(0.003) 

0.036*** 

(0.005) 

Math 

0.024*** 

(0.005) 

0.016** 

(0.007) 

0.031*** 

(0.007) 

AD textile 

-0.003* 

(0.002) 

-0.005 

(0.003) 

-0.001* 

(0.000) 

History 

0.005 

(0.005) 

0.004 

(0.008) 

0.005 

(0.006) 

Economics 

0.003 

(0.003) 

0.002 

(0.003) 

0.004 

(0.005) 

Law 

-0.007** 

(0.003) 

-0.007 

(0.005) 

-0.008** 

(0.004) 

Psychology 

-0.010* 

(0.006) 

-0.015 

(0.011) 

-0.006 

(0.005) 

Media film tv 

-0.012*** 

(0.005) 

-0.013* 

(0.007) 

-0.011** 

(0.005) 

German 

-0.003** 

(0.001) 

-0.002 

(0.002) 

-0.003** 

(0.001) 

Music tech 

-0.004*** 

(0.001) 

-0.001 

(0.001) 

-0.008*** 

(0.002) 

Accounting 

-0.002* 

(0.001) 

-0.002 

(0.002) 

-0.002 

(0.002) 


Each line represents a different regression. Columns 1, 3 and 5 display the coefficients on 
the independent variable 1 = TS. All dependent variables are set equal to 0 if students do 
not continue studying or if they do not take that subject. Usual controls. Robust standard 
errors clustered at the school level. * denotes significance at 10%, ** denotes significance 
at 5%, *** denotes significance at 1%. 
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Table A5: Effect 

on other university majors (age 18) 


Dep. variables 

All 

Coeff. Se 

Girls 

Coeff. Se 

Boys 

Coeff. Se 

Physics 

0.006*** 

(0.002) 

0.001 

(0.003) 

0.009*** 

(0.003) 

Math 

0.001 

(0.002) 

-0.002 

(0.002) 

0.003 

(0.004) 

Engineering 

0.007*** 

(0.002) 

0.003** 

(0.001) 

0.011*** 

(0.003) 

Biology 

-0.001 

(0.003) 

-0.001 

(0.005) 

-0.002 

(0.004) 

Veterinary agric 

-0.001 

(0.001) 

-0.001 

(0.002) 

0.000 

(0.001) 

Computer sci 

-0.001 

(0.001) 

-0.001 

(0.001) 

- 0.000 

(0.002) 

Technology 

- 0.000 

(0.001) 

- 0.000 

(0.001) 

- 0.000 

(0.001) 

General science 

- 0.000 

(0.001) 

-0.001 

(0.002) 

0.000 

(0.001) 

Medicine 

0.003* 

(0.001) 

0.006** 

(0.002) 

0.001 

(0.001) 

Allied medicine 

0.004* 

(0.002) 

0.008* 

(0.004) 

0.000 

(0.002) 

Architecture 

-0.003*** 

(0.001) 

-0.002* 

(0.001) 

-0.004** 

(0.002) 

Other languages 

0.000 

(0.000) 

- 0.000 

(0.001) 

0.000 

(0.001) 

History 

0.001 

(0.002) 

0.003 

(0.003) 

-0.001 

(0.002) 

Art design 

- 0.000 

(0.003) 

0.001 

(0.005) 

-0.002 

(0.003) 

Education 

-0.001 

(0.002) 

-0.001 

(0.004) 

-0.001 

(0.001) 

Soc studies 

0.003 

(0.003) 

0.005 

(0.005) 

0.001 

(0.003) 

Law 

-0.004* 

(0.002) 

-0.006* 

(0.003) 

-0.002 

(0.002) 

Business 

0.001 

(0.003) 

0.001 

(0.004) 

- 0.000 

(0.004) 

Communication 

0.000 

(0.002) 

0.001 

(0.003) 

-0.001 

(0.002) 

Ling classic 

0.005** 

(0.002) 

0.004 

(0.004) 

0.006*** 

(0.002) 

Eu languages 

- 0.000 

(0.001) 

- 0.000 

(0.002) 

- 0.000 

(0.001) 


Each line represents a different regression. Columns 1, 3 and 5 display the coefficients on 
the independent variable 1 = TS. All dependent variables are set equal to 0 if students do 
not continue studying or if they do not take that subject. Usual controls. Robust standard 
errors clustered at the school level. * denotes significance at 10%, ** denotes significance at 
5%, *** denotes significance at 1%. 
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Table A6: Robustness: exclusion restriction 


Dep var: 

1=KS5 sci 
[1] 

l=Russell 

[2] 

1=STEM 

[3] 

l=medicine 

[4] 

l=grad 

[5] 

l=grad STEM 
[6] 

1=TS 

0.057*** 

0.024* 

0.022 

0.010 

0.039 

0.026** 


(0.007) 

(0.014) 

(0.013) 

(0.009) 

(0.028) 

(0.012) 

N 

1613226 

948058 

948058 

948058 

948058 

948058 


ymean 

The sample includes only schools where the triple science class is not likely to be oversubscribed (class 
size not around a multiple of 30). Additional controls: year and school fixed effects; student controls: 
gender, Free School Meal Eligible, Special Education Needs, primary school grade in science, math and 
English; schools controls: school size. The dependent variables in columns 3, 4, 5 and 6 are set equal 
to 0 if students do not continue studying or if they do not take the considered subjects. Robust 
standard errors clustered by school in parentheses. * denotes significance at 10%, ** denotes 
significance at 5%, *** denotes significance at 1%. 


Table A7: Teachers 


Dep. variable: 

N teachers 

[1] 

N qualified 
teachers 
[2] 

1=TS 

1.604 

1.577 


(1.267) 

(1.249) 

N 

1022489 

1022489 

ymean 

70.567 

66.654 


Additional controls: year and school fixed 
effects; student controls: gender, Free School 
Meal Eligible, Special Education Needs, 
primary school grade in science, math and 
english; schools controls: school size. Robust 
standard errors clustered by school in 
parentheses. * denotes significance at 10%, ** 
denotes significance at 5%, *** denotes 
significance at 1%. 
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